[CT] Fix CT Config to honor `fp8_inc` KV cache dtype by yiliu30 · Pull Request #929 · vllm-project/vllm-gaudi

yiliu30 · 2026-02-04T11:16:12Z

Adapt the update in vllm-project/vllm#30141

        # llm-compressor mdls need to set cache_dtype to "fp8" manually.
        if getattr(quant_config, "kv_cache_scheme", None) is not None:
            kv_cache_dtype = "fp8"
            calculate_kv_scales = False
            if cache_config is not None:
                cache_config.cache_dtype = "fp8"
                cache_config.calculate_kv_scales = False

        self.kv_cache_torch_dtype = kv_cache_dtype_str_to_dtype(
            kv_cache_dtype, vllm_config.model_config
        )
        self.kv_cache_dtype = kv_cache_dtype

cc @hshen14 @thuang6 @lkk12014402

Signed-off-by: yiliu30 <yi4.liu@intel.com>

Copilot

Pull request overview

This PR fixes a configuration issue in the Compressed Tensors implementation for HPU (Habana Processing Unit) to properly handle the fp8_inc KV cache dtype instead of the default fp8 format.

Changes:

Added a custom __init__ method to HPUCompressedTensorsConfig that overrides KV cache settings after parent initialization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

vllm_gaudi/ops/hpu_compressed_tensors.py

github-actions · 2026-02-05T02:38:07Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
17b17c068453e6dc6af79240bb94857ae175cc51

yiliu30 added 2 commits February 4, 2026 11:09

Revert LLMC override

60223ea

Signed-off-by: yiliu30 <yi4.liu@intel.com>

fix

329c363

Signed-off-by: yiliu30 <yi4.liu@intel.com>

yiliu30 requested a review from xuechendi as a code owner February 4, 2026 11:16

Copilot AI review requested due to automatic review settings February 4, 2026 11:16

yiliu30 requested review from adobrzyn, afierka-intel, iboiko-habana, kamil-kaczor, ksmusz, mgawarkiewicz-intel and michalkuligowski as code owners February 4, 2026 11:16

Copilot AI reviewed Feb 4, 2026

View reviewed changes

vllm_gaudi/ops/hpu_compressed_tensors.py Outdated Show resolved Hide resolved

vllm_gaudi/ops/hpu_compressed_tensors.py Show resolved Hide resolved

yiliu30 mentioned this pull request Feb 4, 2026

[CT] Add FP8 GQA Support #874

Open

github-actions bot mentioned this pull request Feb 4, 2026

🚦 Team Review Dashboard #701

Open

Merge branch 'main' into fix-llmc-kv

6b0eac3

xuechendi approved these changes Feb 5, 2026

View reviewed changes

xuechendi merged commit 175572b into vllm-project:main Feb 5, 2026
55 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT] Fix CT Config to honor `fp8_inc` KV cache dtype#929

[CT] Fix CT Config to honor `fp8_inc` KV cache dtype#929
xuechendi merged 3 commits intovllm-project:mainfrom
yiliu30:fix-llmc-kv

yiliu30 commented Feb 4, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yiliu30 commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 5, 2026

✅ CI Passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yiliu30 commented Feb 4, 2026 •

edited

Loading